378 research outputs found

    A COMPUTATIONALLY EFFICIENT METHOD FOR DETERMINING SIGNIFICANCE IN INTERVAL MAPPING OF QUANTITATIVE TRAIT LOCI

    Get PDF
    This paper provides a brief introduction to the mapping of quantitative trait loci (QTL). An example on mapping QTL for root thickness in rice is presented to illustrate popular statistical methods used in QTL mapping. Interval mapping is used in conjunction with permutation testing techniques to detect significant associations between genetic positions and quantitative traits while controlling overall type I error rate. A review of a recent technique that can greatly reduce the computational expense of permutation testing in QTL mapping is discussed. Theory is provided for an extension of recent results that may lead to more powerful methods of QTL mapping through permutation testing

    Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis

    Get PDF
    Two-dimensional gel electrophoresis is a biochemical technique that combines isoelectric focusing and SDS-polyacrylamide gel technology to achieve simultaneous separation of protein mixtures on the basis of isoelectric point and molecular weight. Upon staining, each protein on a gel can be characterized by an intensity measurement that reflects its abundance in the mixture. These can then conceptually be used to determine which proteins are differentially expressed under different experimental conditions. We propose an EM approach to identify differentially expressed proteins using an inferential strategy that accounts for uncertainty in matching spots to proteins across gels. The underlying mixture model has trivariate Gaussian components. The application of the EM is however, not straightforward, with the main difficulty lying in the E-step calculations because of the dependent structure of proteins within each gel. Therefore, the usual model-based clustering approach is inapplicable, and an MCMC approach is employed. Through data-based simulation, we demonstrate that our proposed method effectively accounts for uncertainty in spot matching and more successfully distinguishes differentially and non-differentially expressed proteins than a naïve t-test which ignores uncertainty in spot matching

    Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

    Full text link
    We establish stability of random forests under the mild condition that the squared response (Y2Y^2) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed Y2Y^2. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when YY is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.Comment: NeurIPS 202
    • …
    corecore